Direct Preference Optimization (DPO) explained